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Abstract 

Id ur series of projects for accumulating sequence information on the coding sequences of uniden- 
tified human genes, we have newly determined the sequences of 100 cDNA clones from a set of size- 
fractionated human brain cDNA libraries, and predicted the coding sequences of the corresponding genes, 
named KIAA0711 to KI A AO 8 10. These cDNA clones were selected according to their coding potentials of 
large proteins (50 kDa and more) in vitro. The average sizes of the inserts and correspondiog open reading 
frames were 4.3 kb and 2.6 kb (869 amino acid residues) t respectively. Sequence analyses against the public 
databases indicated that the predicted coding sequences of 78 genes were similar to those of known genes, 
64% of which (50 genes) were categorized as proteins functionally related to cell signaling/communication, 
cell structure/znotility and nucleic acid management. As additional information concerning genes character- 
ized in this study, the chromosomal locations of the clones were determined by using human-rodent hybrid 
panels and the expression profiles among 10 human tissues were examined by reverse transcription-coupled 
polymerase chain reaction which was substantially improved by enzyme-linked immunosorbent assay. 
Key words: large proteins; in vitro transcription/ translation; cDNA sequencing; expression profile; chro- 
mosomal location; brain 



1. Introduction 

As a c mplement of human genome sequencing, 1 anal- 
ysis of cDNAs is expected to provide indispensable infor- 
mation for the interpretation of genomic sequences. In 
particular, it is very advantageous that cDNA clones con- 
vey more unambiguous protein coding information than 
gen mic clones and that they can be used as versatile 
reagents in functional studies of genes. Considering the 
importance of the cDNA analysis, we began sequencing 
the entire length of cDNAs in order t accumulate infor- 
mation on the coding sequences of unidentified human 
genes. 2 Recently, we have focused our sequencing efforts 
on the analysis of large cDNAs (> 4 kb) encoding large 
proteins (> 50 kDa) in brain, since these g nes are likely 
to play an important role in mammals. 3 * 4 As an extensi n 
of the preceding reports, we herein present the entire se- 
quences, expression profiles am ng 10 human tissues and 
chromosomal locations of 100 new cDNA clones. Further- 
more, specific features of the newly predicted protein se- 
quences are described on the basis of the homology/motif 



analysis. 

2. Materials and Methods 

Source and screening of cDNA clones 
The stee-fractionated human brain cDNA library 
Nos. 2 to 5 (average insert size = 3.9, 4.5, 5.3 and 6.1,kb, 
respectively) 5 were used as a source of cDNA clones. 
Most of the cDNA clones analyzed in this study were se- 
lected from library No. 2 (average insert size = 3.9 kb). 
cDNA clones were first screened according to their in 
vitro protein-c ding potentials and then by single-pass 
sequencing of both termini as described previously. 3 The 
sequences thus determined were subjected to homol gy 
search against the GenBank database (release 102.0) ex- 
cluding expressed sequence tags and genomic sequences. 
The clones with unidentified sequences at both ends were 
sequenced in their entirety as described. 3 As an excep- 
tion, cDNA clones which are likely toe ntain much larger 
open reading frames (ORFs) than those already regis- 
tered In the OubllC databases xvrro Rr»nurn™»rl f UaiV ~~ 
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2.2. Gene expression profiles 

Expression patterns of newly identified genes in 10 
human tissues were examined by reverse transcription* 
c upled polymerase chain reaction (RT-PCR) as de- 
scribed previously/ except that the detection and quan- 
tification of the PGR products were done by enzyme- 
linked immunosorbent assay (ELISA). For ELISA, the 
RT-PCR was modified to be conducted in the presence 

fdigoxigenin (DIG)-ll-dUTP (DIG PCR labeling mix, 
Boehringer Mannheim, Germany) while other conditions 
were unchanged. The nucleotide sequences of respective 
PCR primers axe available upon request. The obtained 
DIG-labeled PCR products were subjected to quantifica- 
tion with a PCR ELISA kit from Boehringer Mannheim. 
Since the kit takes advantage of solution hybridization 
for specific detection of desired PCR products, authen- 
tic biotinylated products were prepared from the iso- 
lated cDNA clones by PCR in the presence of 0.1 mM 
biotin-14-dATP (Life Technologies, Inc., USA), purified 
on agarose gels, and then used as probes for the solu- 
ti n hybridization. The RT-PCR ELISA was performed 
exactly as described by the instructions provided with 
the kit, except for the following points: Because biotiny- 
lated PCR products were used as probes in place of 

ligotners T the hybridization was carried out in 210 /tl 
of hybridization solution containing 20 ng of the probe 
at 65 °C for 16 hr; after the hybridization, biotin-labeled 
molecules were captured in a well of streptavidin-coated 
microtiter plate at S3 °C for 30 min. The color devel- 
opment with horseradish peroxidase and 2,2 / -azino-bis(3- 
ethylbenzthiazolme-6-sulfonate) , the final detection step 
of the RT-PCR ELISA, was monitored by absorption at 
405 nm in a kinetic mode with a SPECTRA max 250 
microtiter plate reader (Molecular Device, Co., Sunny- 
vale, CA) at 37 °C. The ELISA data were converted to 
the mRNA levels expressed as equivalent amounts of the 
cDNA plasmid on the basis of ELISA control curves using 
PCR products derived from serial dilutions of a known 
amount of the authentic plasrnids with a software pack- 
age, SOFTmax PRO (Molecular Device, Co.). The dig- 
itized mRNA levels were then displayed by color codes 
to facilitate survey of many gene expression profiles at a 
glance. 

2.3. Other methods 

DNA sequencing and homology search of the pre- 
dicted protein-coding sequences were carried out as de- 
scribed previously, 3 '* except that most DNA sequencing 
reacti ns were performed using ABI PRISM™ dRho- 
damine terminator cycle sequencing ready reaction kit 
(Perkin-Elmer Co., USA). Plasmid DNAs for sequenc- 
ing and in vitro transcription/translation were prepared 
by the Wizard Plus SV Minipreps DNA Purification 
system (Promega Corp., Madison, WI), except that 
the spin columns were replaced with MultiScreen-FB 



plates (Milliporc Corp., Bedford, MA) for adapting 
the system in th 96-well format. When the possi- 
bility of spurious interruption of ORFs was noticed, 
a likely region which causes the interruption was am- 
plified by RT-PCR and then examined by DNA se- 
quencing as described previously. 5 Chromosomal loca- 
tions of newly identified genes were determined us- 
ing human-rodent hybrid panels, GeneBridge 4 (Re- 
search Genetics Inc., USA) C if their mapping data 
were not available in the UniGene database (http:// 
www.ncbi.nlm.nih.gov/UniGene/index.html). For genes 
whose chromosomal locations were described in the 
UniGene database, we did not perform the radia- 
tion hybrid mapping experiments as a general rule. 
The actual primer sequences and the reaction con- 
ditions used for PCR-assisted chromosomal mapping 
are accessible through the World Wide Web at http: 
/ / www.kaj5usa.or.jp. 

3. Results and Discussion 

3, 1, Sequence analysis and prediction of protein-coding 
regions of cDNA clones 
The criteria of cDNA clone selection were the same 
as reported in the previous studies; they must be un- 
charactemed and can direct synthesis of pr t ins larger 
than 50 kDa in vitro. One hundred clones thus selected 
were subjected to sequencing of entire inserts. Most 
of the cDNA clones (87 clones) were derived from li- 
brary No. 2 (average insert size of 3.9 kb) in this study, 
since cDNA clones in this library had not been exten- 
sively characterized yet. As described previously, 5 some 
clones were found to carry spurious coding interrup- 
tion^): Four clones (KIAA0799-0801» and KIAA0810) 
were found to carry relatively long insertions, proba- 
bly corresponding to intronic sequences; the ORFs in 
5 clones (KIAA0S03, KIAA0804, KIAAOS07-0809) were 
frame-shifted by insertion or deletion of a small number 
of nucleotide residues; the ORFs in 3 clones (KIAAQ802, 
KIAA0S05 and KIAA0806) were interrupted by relatively 
short insertions (152, 94 and 188 bp, respectively). For 
those genes, the revised sequences by the RT-PCR exper- 
iments, not the actual cloned cDNA sequences, were de- 
posited to GenBank/EMBL/DDBJ databases and used 
for prediction of protein-coding sequences. The sequence 
data revealed that the average sizes of these cDNA inserts 
and of their ORFs were 4.3 kb and 2.6 kb (corresponding 
to 869 amino acid residues), respectively. Physical maps 
of the 100 cDNA clones analyzed are shown in Fig. 1 
wher the ORFs and the first ATG codons in respec- 
tive ORPs are indicated by solid boxes and triangles, re- 
spectively. The in-frame t rminati n codons upstre m of 
the first ATG codon were identified in 41 clones, among 
which 33 clones carried the ATG codon within the con- 
text of Kozak's rule. 7 In Fig. 1, short interspersed nu- 




Figure 1. Physical maps of cDNA clones analyzed. The physical maps shown her* were constructed on the basis of the sequence 
data or respective cDNA clones. The horizontal scale represents the cDNA length in kb. and the gene numbers corresponding 
to respective cDNAs are given on the left. The ORFs and untranslated regions are shown by solid and open boxes, respectively. 
The positions of the first ATG codons with or without the contexts of the Koxak*s rule are indicated by solid and open triangles, 
respectively. RepeatMasker, »vhich is a program that screens DNA sequences for interspersed repeats known to exist in mammalian 
genomes, was applied to detect repeat sequences in respective cDNA sequences (Srait, A. F. A- and Green, R, RepeatMasker 
at http://ftp.genomc.washington.edu/RM/ItepcatMa3ker.html). Short interspersed nucleotide elements (SINEs) including Alu and 
MIRs sequences and other repetitive sequences thus detected are represented by dotted and hatched boxes, respectively. 
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Tabic 1. Information of sequence data and chromosomal locutions of the identified genes. 
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a) Accession numbers of DDBJ, EMBL and GenBank databases. 

b) Values excluding po)y(A) sequences. 

c) Chromosome number* identified by using Gene Bridge 4 radiation hybrid panel unless specified. The chromosomal locations highlighted 
by asterisks were fetched from the UniGene database, 

d) cDNA and ORF lengths were revised by direct analysis of the RX-PCR products. 
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Table 2. Functional classifications of the gene products based on homologies to known proteins and sequence motifs. 
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Tabic 2- Continued. 
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g*» R CAOFMHi t^orw 47.1 7.H 

Nnh»»flS)ta C y o7I2 *«• 

U7U m 

0730 none 

OT32 *o#k 

0741 VMW 

fl7H iuwc 

C7W rum 

0737 nop* 

076* ww 

MM 

0766 tunc 

0707 mm 

0773 ihwk 

077* none 

0775 IKMC 

0711 IMMK 

0784 nunc 

07*5 none 

oTDO mmc 

0792 niMe 

0802 none 



a) Classifications baaed on the annotation* cf their homologous protein entries in the databases. 

b) The gene products waro grouped into four similarity classes according to the sequence identities obtained by the GAP program: 
I, identical to known human gene products (sequence Identity, > 9091); H, homologous to known non-human gene product* (sequence 
identity, > 90%); R, related to come known gene products (sequence identity, 30 to 90%); W, very weakly related to known gene products 
(sequence identity, < 30%). 

c) Organisms in which these entries were identified arc given in parentheses; B, bovjne; Ce, CoenOTnoMitts cfcyans; D, DrvsopMla 
metanogasttr; Eh, Enfomozba histolytica- FV, Fugu rubri-pcs\ H, human; M, mouse; Nc, })fwun>Mpera crass a; Oc, Oryctalagus cvniculus; 
R, rat; Rb, rabbity Sc, Saccharxymytxs ceret/iaiae; Sp» $chizosaecb*rornyc*M pombc; X, Xenoptu iacvis. 

d) Accession numbers of homologous entries in DDBJ/£MBL/GenBank/OWL/SW75S*PROT/PIR database are shown. 

e) The values were obtained by the PASTA program. 



cleotide elements (Alu and MIR3 sequences) and other 
repetitive sequences detected by using RepeatMasker 
pr gram are also displayed by dotted and hatched boxes, 
respectively. Tfcble 1 lists the gene codes (KIAA num- 
bers), the accession numbers of the nucleotide sequences 
in GenBank/BMBL/DDBJ databases, the sizes of the 
cDNA inserts and the identified ORF\ and the chromoso- 
mal locations of the respective genes. The chromosomal 
locati ns of 37 genes, which are highlighted by asterisks, 
were fetched from the UniGene database while the re- 
maining 63 chromosomal locations were experimentally 
determined in this study. 

3.E. Functional classification of predicted gene products 
By homology and motif searches against DNA, pro- 
tein, and protein-motif databases [GenBank (release 
108.0), OWL (release 30.3), and PROSITB (release 
15.0) databases] using Wisconsin Sequence Analysis 
Package™ (version 8; Genetics Computer Group, Inc. 
USA), the predicted coding sequences of 78 genes 
were found to exhibit significant similarities to those 

-r 1 — * n <4 RAVL of thorn were classified 



acid management. The results of the functional classifi- 
cation of these newly identified genes on the basis of this 
homology /motif analysis arc summarized in Tabic 2. 
Interesting features to be noted are summarized below. 

1. Except for the C2H2-type tine finger protein fam- 
ily, 21 newly identified genes constitute 18 inde- 
pendent paralogous groups together with the genes 
characterized in our cDNA project (Table 3). In 
this case, genes which exhibit significant similarities 
throughout the protein-cod ing sequences, not in dis- 
tinct domains or motifs, have been assigned as those 
with a paralogous relationship. Among these paral- 
ogous groups, genes in 7 groups were "uncharacter- 
ized." 

2. Several gene products exhibited similarities to 
synaptic prot ins. Three of them (products of 
K2AA0735, KIAA0736 and KIAA0743) were hu- 
man counterparts of rat synaptic vesicle protein 
2B, 1B transport protein-like protein p87 (synap- 
tic vesicle protein 2) 19 and Neuiexin III, 11 respec- 
tively. In addition, KIAA0768 and KIAA0786 
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Table 3. The newly identified genes in p&ralogous relationship with genes characterized by our cDNA project. 



New gene 


Paralogous gene 


— - n — 

Accession no.*' 


Identities (%)" , 


Corresponding gene product 


KIAA0715 


K1AA0566 


AB01U38 


40.8 


calcium- transporting ATPase 


KIAA0716 ' 


KIAA0200 


D86964 


37.3 


DOCK180 protein" 




KIAA0299 


AB002297 


5S.8 


DOCK 180 protein 


K1AA0717 


K1AAO7405) 


AB0182&3 


66.G 


un characterised 


KIAA0722 


KIAA0623 


AB014523 


15-6* 


protein kinase ITLK1 9 


KIAA0728 


KIAA0465 


AB007934 


71.5 


spectrin <a 10 


KIAA0737 


KIAA0808 C > 


AB018351 


43.2 


uneharacterized 


KIAA0743 


KIAA0578 


A BO 11150 


77.2 


neurexin III 11 


KIAA0744 


KIAA0288 


AB006626 


52.8 


un characterised 


KIAAQ751 


KIAA0340 


AS0O2338 


60.0 


rab3 effector WM a 


KIAA0755 


K1AA0079 


D38555 


55.0 


uncharacter faced 


KIAA07S6 


KIAA0343 


AB002341 


48.6 


neitrofascin 13 


KIAAQ763 


K1AA0522 


AB011091 


64.6 


wnch ar act erised 


KIAA0768 


KIAA07o6 r) 


AB018329 


61.2 


latrophilin-related protein l 1 * 


KIAA0782 


K1AA0580 


A BO 1U 52 


42.9 


uncharacter ised 


KIAA0795 


KIAA0132 


D50922 


33.7 


ring canal protein 10 


KIAA0805 


KIAA0435 


AB0O789S 


47.4 


pecanex protein 1 * 


KIAA0807 


KIAA0303 


AB0O2301 


58.5 


serine/threonine protein kinase MAST205 iT 




KIAAQ561 


AB01U33 


60.7 


serine/ threonine protein kinase MAST205 


KIAA0810 


KIAA0668 


AB014S68 


38.0 


uncharact erised 



a) Accession numbers of p&ralogous genes in DD&J/EMBL/GenBa&k database are shown. 

b) The value* of the overall identities of amino acid residues were obtained by the GAP program. 

c) These genes arc reported in this paper. 




Figure 2. Comparison of RT-PCR EL1SA method with conventional RT-PCR method coupled wjth gel electrophoresis. RT-PCR 
products of 10 previously reported genes 4 were analyzed by gel electrophoresis and RT-PCR CttSA. Ten microliters of RT-PCR 
products were analysed oo 2.5% NuSefve GTQ agarose gels (FMC BioProducts, USA) and stained with ethidium bromide as described 
previously. 4 In each of the gel images , the first 5 lanes were used for external control reaction products, which allows us to estimate 
the PCR amplification efficiency. The RT-PCR BLI5A data were expressed as amounts (fg) of the corresponding cPNA pJasmida in 
1 ng of itarting J>oly(A) + RNAs by color codes using the conversion panel shown In the left side of this figure. RT-PCR products 
independently prepared were used for gel electrophoresis and EL IS A. KIAA gene names are given between the gel images and the 
EL13A color codes. 



which is a member f the secretin family of G 
protein-coupled receptors. Besides these genes and 
KIAA0686 previously reported, KIAA0758 gene 
product aLs exhibit d weak similarity to rat LPH1 



uct exhibited structural similarity in the overall re- 
gion of rat RIM, which is a putative Rab3 effec- 
tor in regulating synaptic- vesicle fusion. 12 In ac- 
cord with thplr Y\nociM» ----- c - r «• 
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Figure 3. Expression profiles of 100 ocwly identified genes in 10 different tissues examined by RT-PCR EL1SA. The tissue expression 
levels of 100 human genes newly identified in this study were analyzed by the RT-PCR BLISA Cene psrnes are given as KIAA 
numbers at the left side of each set of color codes. Tissue n&mes are indicated on above the top suts of color codes. 
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than other human tissues as revealed by RT-PCR 
ELISA experiments described below. 

3. Protein motif search against the PROSITE database 
revealed that protein kinase signatures are present 
in two genes (KIAA0787 and KIAA0807) and 
that two genes (KIAA0760 and KIAA0798) con- 
tained multiple C2H2~type zinc finger domains in 
the deduced coding regions. The Dbl homology 
domain M is present in the KIAA0720-eneoded pro- 
tein in addition to 10 such genes previously reported 
(KIAA0006, KIAA0142. KIAA0294, KIAA0337, 
KIAA0362, KIAA0380, KIAA0382, KIAA0424, 
KIAA0521 and KIAA0651). 

3.3. Expression profiles of predicted gents 

In this series f human cDNA analyses, we have used 
RT-PCR for exploring mRNA expression patterns of 
ocwly identified genes. Although this method allows us 
t detect even a trace amount of mRNA with minimum 
consumpti n of poly (A) + RNA, an obvious drawback of 
the method is the low quantity produced particularly 
when the signals axe analyzed by gel electrophoresis fol- 
lowed by staining with a fluorescent dye. To address this 
problem, we newly introduced an ELISA-based proce- 
dure for quantification of specific PCR products in place 
of the gel eleetr phoreais-based one. Figure 2 compares 
the ELISA outputs of RT-PCR analyses of 10 previously 
identified genes with the gel images we had reported 
before. 1 Although both procedures gave essentially the 
same expression patterns for these 10 genes in a rough 
sense, it was evident that quantitative characteristics of 
tissue expression patterns were more clearly seen by the 
RT-PCR ELISA; Therefore, we decided to obtain the ex- 
pression patterns of genes reported in this study by the 
RT-PCR ELISA method. Although the introduction of 
the ELISA-assisted procedure in the detection of PCR 
products made the quantitative characteristics of the 
data more evident, the expression profiles thus obtained 
still may possess the artifacts present in conventional RT- 
PCR based method. Since we could monitor the amounts 
of only a relatively short cDNA region flanked by PCR 
olig mers, alterations in transcript structure such as al* 
ternativc splicing or alternative poly(A) addition could 
not be discriminated form changes in transcription lev- 
els. In addition, ince the measurements were not rnul- 
tiplicated due t limitations of cost and labor, run-to- 
run variation c uld not completely be excluded. Taking 
these possibilities into account, we c nfined these expres- 
sion profiles to use as important clues for the search of 
biologically interesting genes. 

Figure 3 shows the expression patterns of the 100 newly 
identified genes reported in this study. By using c lor 
codes instead of numerals f r displaying the expression 
levels, the screening of g nes according to their tissue 
specificity was greatly facilitated. Since the expression 



levels are given as equwlent weights of the corresponding 
plasmid cDNAs, it is possible, at least in principle, t 
roughly compare the expression levels of a gene among 
10 different tissues as well as those of genes within a 
particular tissue. Since these genes were identified in a 
human brain cDNA library, it is not surprising that dark 
blue color codes were hardly seen in the column of the 
brain. In contrast, the pancreas and the spleen columns 
contained many dark blue color codes, which suggested 
that a number of the genes actively expressed in the brain 
were dormant in these tissues. These expression profiles 
offer another line of information required for discovering 
biologically important genes characterized in this project. 
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